Logically-Correct Reinforcement Learning
نویسندگان
چکیده
We propose a novel Reinforcement Learning (RL) algorithm to synthesize policies for a Markov Decision Process (MDP), such that a linear time property is satisfied. We convert the property into a Limit Deterministic Büchi Automaton (LDBA), then construct a product MDP between the automaton and the original MDP. A reward function is then assigned to the states of the product automaton, according to accepting conditions of the LDBA. With this reward function, RL synthesises a policy that satisfies the property: as such, the policy synthesis procedure is “constrained” by the given specification. Additionally, we show that the RL procedure sets up an online value iteration method to calculate the maximum probability of satisfying the given property, at any given state of the MDP – a convergence proof for the procedure is provided. Finally, the performance of the algorithm is evaluated via a set of numerical examples. We observe an improvement of one order of magnitude in the number of iterations required for the synthesis compared to existing approaches.
منابع مشابه
Graph-Based Reasoning and Reinforcement Learning for Improving Q/A Performance in Large Knowledge-Based Systems
Learning to plausibly reason with minimal user intervention could significantly improve knowledge acquisition. We describe how to integrate graph-based heuristic generalization, higher-order knowledge, and reinforcement learning to learn to produce plausible inferences with only small amounts of user training. Experiments on ResearchCyc KB contents show significant improvement in Q/A performanc...
متن کاملAn Adaptive Learning Game for Autistic Children using Reinforcement Learning and Fuzzy Logic
This paper, presents an adapted serious game for rating social ability in children with autism spectrum disorder (ASD). The required measurements are obtained by challenges of the proposed serious game. The proposed serious game uses reinforcement learning concepts for being adaptive. It is based on fuzzy logic to evaluate the social ability level of the children with ASD. The game adapts itsel...
متن کاملFormation of Attention and Associative Memory based on Reinforcement Learning
An attention task, in which context information should be extracted from the rst presented pattern, and the recognition answer of the second presented pattern should be generated using the context information, is employed in this paper. An Elman-type recurrent neural network is utilized to extract and keep the context information. A reinforcement signal that indicates whether the answer is corr...
متن کاملSelecting the State-Representation in Reinforcement Learning
The problem of selecting the right state-representation in a reinforcement learning problem is considered. Several models (functions mapping past observations to a finite set) of the observations are given, and it is known that for at least one of these models the resulting state dynamics are indeed Markovian. Without knowing neither which of the models is the correct one, nor what are the prob...
متن کاملProbably Approximately Correct (PAC) Exploration in Reinforcement Learning
OF THE DISSERTATION Probably Approximately Correct (PAC) Exploration in Reinforcement Learning by Alexander L. Strehl Dissertation Director: Michael Littman Reinforcement Learning (RL) in finite state and action Markov Decision Processes is studied with an emphasis on the well-studied exploration problem. We provide a general RL framework that applies to all results in this thesis and to other ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1801.08099 شماره
صفحات -
تاریخ انتشار 2018